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Abstract 

Approximate Bayesian Computation (ABC) are likelihood-free Monte Carlo methods. ABC methods use a comparison 
between simulated data, using different parameters drew from a prior distribution, and observed data. This comparison 
process is based on computing a distance between the summary statistics from the simulated data and the observed data. 
For complex models, it is usually difficult to define a methodology for choosing or constructing the summary statistics. 
Recently, a nonparametric ABC has been proposed, that uses a dissimilarity measure between discrete distributions based on 
empirical kernel embeddings as an alternative for summary statistics. The nonparametric ABC outperforms other methods 
including ABC, kernel ABC or synthetic likelihood ABC. Flowever, it assumes that the probability distributions are discrete, 
and it is not robust when dealing with few observations. In this paper, we propose to apply kernel embeddings using an 
smoother density estimator or Parzen estimator for comparing the empirical data distributions, and computing the ABC 
posterior. Synthetic data and real data were used to test the Bayesian inference of our method. We compare our method 
with respect to state-of-the-art methods, and demonstrate that our method is a robust estimator of the posterior distribution 
in terms of the number of observations. 


1 Introduction 


Many Bayesian applications use inference for nonlinear stochastic models, where it is expensive or difficult to evaluate the 
likelihood; or the normalization constant in Bayesian modelling is also intractable. Approximate Bayesian Computation 
(ABC) are likelihood-free Monte Carlo methods. ABC methods can be em ployed to infer posterior distributions without 
having to evaluate likelihood functions ( Toni et all 2009 : Wilkinson . 2008h . They simulate data from a model with differ¬ 
ent parameter values and compare summary statistics of the simulated data whit summary statistics of the observed data. 
There are many problems associated to how to choose the summary statistics with the goal of obtaining accepted samples. 
The choice of these summaries is frequently not obvious ( Joyce an d Marior aml 200 8[) and in m any cases it is difficult or 
impos sible to construct a general method for finding such statistics ( Fearnhead and Pranglel 2012h . According to Park et al 


(12015 ). the selection of the summary statistics is an important stage in ABC methods that is still an open question. 
Different algorithms for choosing or constructing summary statistic s have been proposed in the literature. In the state-of- 
the-art, we can find linear regression (Fearnhead and Pranglel 2012h . Another option is to use a minimum entropy criterion 
for choosing the summary statistics ( Blum et al. . 20131: Blum , ^id : Nunes and Balding, 2010h . Park et all ( 20151) propose 


a fully nonparametric ABC that avoids to select manually the summary statistics, using kernel embeddings (iGretton et al. 


2007 al) : they employ a two-step process: the first step is to compare the empirical data distributions using a dissimilarity 
distance based on Reproducing Kernel Hilbert Spaces (RKHS). Such a distance is termed Maximum Mean Discrepancy 
(MMD). In the second stage, they use another kernel that operates on probability measures . This nonparam etric ABC 
outperforms other methods, ABC, kernel ABC or synthetic likelihood ABC, for more details see Park et al. ( 20151) . However 
this nonparametric ABC assumes that the probability distributions are discrete and it is not robust when there are few 
observations. 

In this paper, we propose a new metric for com paring two data di stributions in a RKHS with application to ABC simulation. 
The difference with respect to Park et al. in IPark et al.l (l2015h is the assumption that the probability density functions, 
for corresponding probability distributions in RKHS, are continuous probability functions, estimated using an smoother 
density estimator or Parzen estimator. This fact allows us to obtain a biased estimator of the dissimilarity distance between 
two empirical data distributions that can be written as Xf + {1 — A)MMD, where f € Tf {TL is a Hilbert space), and 
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0 < A < 1. It is possible to demonstrate that our estimator presents a lower root mean sq uared error, between th e true 
parameters and the estimated parameter, than the MMD estimator, for any value of A, see Muandet et alJ ( 2013 ). The 
difficulty of calculating dissimilarity metrics for empirical distributions in RKHS is to compute integrals, these integrals 
are usually difficult to solve analytically. In this paper, we use Gaussian kernels for estimating these probability density 
functions. 

We compare ABC based in our metric in RKHS with the ABC and K2ABC proposed by Park et al. ( 20L5h . We also show 
how to use these methods in combination with sequential Monte Carlo methods. 

Experimental results obtained include the application of the different methods described above over a synthetic dataset and 
a real dataset. The synthetic data was created from an uniform mixture model. We demonstrate that our method is a robust 
estimator of the parameter vector in terms of the number o f observations . The real data corresponds to the change of adult 
blowfly population during a period of time, as explained bv lWoodI (l2010l) . 


2 Approximate Bayesian Computation based on Kernel Embeddings 

In this section, we briefly expose the different ABC methods. We then define the metrics based on kernel embeddings 
employed with the ABC methods, including the MMD metric and the metric that we propose in this paper. 


2.1 Short summary on ABC methods 


Approximate Bayesian Computation (ABC) are likelihood-free Monte Carlo methods. In practical Bayesian models, exact 
inference may be intractable due to different reasons: the likelihood function is expensive or intractable; or the normalization 
constant in Bayesian modelling is also intractable. ABC methods can be e mployed to infer posterior distributions without 
having to evaluate likelihood functions ( Toni et all l2009l Wilkinsotl 2008 ). These likelihood-free algorithms simulate data 
using different parameters drew from prior distribution and compare summary statistics of the simulated data (s (D')) whit 
summary statistics of the observed data (s (D)). For this comparison it is necessary to define a tolerance threshold that 
determines the accuracy of the algorithm, and a distance measure p (s (V), s {'D')), for example Euclidean distance, etc. If 
p (s (22), s ( V')) < e, we then accept the parameters 6 drew from p (0), otherwise, it is rejected. 

According to lWilkinsonI (I2008r) . there are different open questions around the ABC algorithm including how to choose the 
measure function p; what should be the value of e?; how to select the summary statistics s? p and e are experimental and 
impleme ntation issues. Wi th respect to s, it is difficult to define a methodology for choosing or constructing the summary 
statistics ( Park et al. . 2015 ). 


2.2 Metrics between probability measures by using kernels 


The embedding of distributions in a Reproducing Kernel Hilbert Space (RKHS) is a me thodology that allows us to compute 
distan ces between distributions by using kernel functions ( Berlinet and Christinel 20041) . According to Srinerumbudur et ^ 
(1201 oh . a metric 7 fc(P, Q) over the probability measures P and Q can be defined through a characteristic kernefl k{x, x') 


as. 


Ik {P, Q) 



k (•, x) dP {x) 



{■,y) dQ (y) 
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If the distributions P{x), and Q{y) admit a density, then we have dP(x) = p{x)dx, and dQ{y) = q{y)dy, and an alternative 
expression for 7 fc (P, Q) can be written as 


'ykiP,Q)= / k{x,z)p{x)q{z)dxdz+ / k{z,y)p{z)q{y)dzdy 
JmJm JmJm 

-2 / k{x,y)p{x)q{y)dxdy. 

Jm Jm 


( 1 ) 


* A characteristic kernel is a reproducing kernel for which 7 fc(P, Q) = 0 
measures on a topological space (M, A). 


P = Q^P^Q where V denotes the set of all Borel probability 
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2.3 Kernel Embeddings as summary statistics for ABC 


Let us assume that we have two random samples X = and Y = {yj}^Zi- tn ABC, one of those samples would 

correspond to the real data {V), whereas the other sample would correspond to simulated data {V) from the model we 
wi sh to estimate. A s mentioned before, we need a way to define if accepting the simulated data. A key idea introduced 
by Park et all ( 2015 ) was to assume that the random samples X, and Y are drawn from probability measures P, and Q, 
respectively, and instead of comparing the distance between samples X, and Y, they propose to compare the distance 
between the pr obability m e asures P and Q. 


The authors in IPark et al.l ( 12015 ) assume empirical distributions for P, and Q, this is p{x) = ~ 


Qiy) — 'W Sfci ~ Vj)- With these expressions forp(a:), and q{y), the distance 7 /c(P, Q) is given by 


Nx 




Ik ^ {xi,Xj) + {yi,yj) - 

i j—\ y i j—\ 

,MMD 


N^,Ny 


N^N, 


k{xi,yj). 


( 2 ) 


y - . 1 

JJ = 1 


W e refer to this distance as Q), since it is rooted in the Maximum Mean Discrepancy (MMD) concept developed 


Gretton et al.l (l2007bi l2012h . After obtaining 7 ““°(P, Q), Park et al. in iParket al.l (12015h apply a second Kernel that 


operates on probability measures, as follows 


ke {P-d,Qv') = exp 




(3) 


where e is a positi ve parameter for t he second kernel. Pp is the distribution associated to V, and Qd' is the distribution 
associated to V'. In IPark et al.l(l2015b . the authors use an unbiased estimate for 7 ^^°(P, <5) in which the factors l/N^, and 
1 /Ny are replaced for 1 / {Nx{Nx — 1)), and l/{Ny{Ny — 1)), respectively. Also, the innermost sum in each of the first two 
terms does not take into account the terms for which i = j. 

Instead of assuming a discrete distribution for p{x), and q{y), in this paper we propose to use a smooth estimate for both 
densities based on the Parzen-window density estimator. We assume that the densities p{x), and q{x) can be estimated using 
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where hp and hq are the kernel bandwidths, and D is the dimensionality of the input space. 

If we use a Gaussian kernel with parameter E for k{x^ x'), and the estimators p{x), and q{y), a new distance between the 
distributions P and Q is easily obtained from expression (|2|i as follows 


1 » 1 ” .. 

lk{P,Q) = Y +-^ Y k [yi.yp^llq) (4) 

X y i j—\ 

2 ^ 

~ ]^ ]\f E/ k{xi,yj\T,p YT,q), 


where 


k {x, x'\ S) 


|E + 5|i/2 


exp 


{x — x')^ {T, + S) ^ {x — x') \ 


In expression (|4|i, Ep = and Eg = h'^1. We refer to the metric in & as 7r“"(P, Q)- 


As a d istance measure 'Yk{Pv, Qv) we can use 7 ^^’^(Pd, Qv) or 7 ^“'^™(Pi 5 , Qt>')- The algorithm proposed bv iPark et al. 


(120151) that employs kernel embeddings of probability measures in ABC using MMD is shown in Algorithm[T] If we use the 
metric 7 ^'^’^(Pd, Q-d') on line 4 for the algorithm, we refer to the method as K2ABC. If we use the metric 7fe““"(pD, Qv) 


instead, we refer to the method as PABC. 
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Algorithm 1 ABC based on Kernel Embeddings 
1: Input: Observed data 1), prior distribution, threshold e. 
2 : for i = 1,..., Ns do 
3: Draw 0 from p (0) 

4: Simulate V using p{-\0) 

5: Compute Wi = exp 

6 : end for 

7: Normalize wt 


2.4 Extension to ABC SMC 


The disadvantages of ABC, suck as: the selection of e , the choice of summary statistics or accepted samples with low 
probability, it can b e avoided using an ABC algorithm based on sequential Monte Carlo methods (ABC SMC) proposed by 
Sisson et'ni (l2007h . The goal of ABC SMC is to obtain an approximation of true posterior using a series of sequential steps, 
expressed hy p{9\ p {V, V) < et), for i = 1, • • •, T, where et is a threshold that decreases in each step (ei, et, > 

ex), thus it refine the approximation towards the target distribution. 

ABC SMC has a first stage based on ABC rejection. We can replace this stage with K2ABC or PABC, leading to what 
we call in the pa per as the K2ABC SMC, and PABC SMC methods. Details of the ABC SMC method can be found in 


Toni et al](l2009l) . 


3 Experimental Setup 

We first describe the synthetic dataset and the real dataset used for our experiments. Finally, we explain the procedure for 
applying the ABC methods over synthetic and real datasets. 


3.1 Datasets 


We follow the experiments described in iPark et al.l (120 15h . Synthetic data is created from an uniform mixture model. The 
real dataset correspond to a time series of an adult blowfly population ( Wood . 201(1) . 


Toy Problem. The synthetic dataset was generated by an uniform mixture model, P (22| 0) = {k,k — 1), where 

V are the observed data; are the mixing coefficients; and K is the number of components. The model parameters 
0 correspond to the mixing coefficients tt = [0.25, 0.04, 0.33, 0.04, 0.34]^. The prior distribution for 0 is a symmetric 
Dirichlet distribution. 


Noisy nonlinear ecological dynamic system For the real dataset example, we would like to infer the parameters of a non¬ 
linear ecological dynamic system ( Meeds and Welling . 2014 ) represented through Nt+i — ^ et + 

Nt exp {det), where Nt is the adult population at time t, and P, Nq, d and r are parameters. Variables e* and 
Et are employed to represent noise, and they are assumed to follow Gamma distributions e* ^ (7 (l/Cp, l/Cp), and 
£t Q The parameter vector we want to infer is given by 0 = [P, TVg, d, r, Up, ad]^ ■ The observed data that 

we use in the experiments correspo nd to a time s eries of the adult population of sheep blowfly, with 180 observations. The 
data was provided by the authors of Wood ( 2010l) . 


3.2 Validation 

We perform a comparison between ABC, K2ABC, PABC, SMCABC, K2ABC SMC, and PABC SMC. The comparison 
is made in terms of the root-mean-squared error (RMSE) between the true parameters (when known) and the estimated 
parameters by the different methods. For the real dataset, we compute the cross-correlation coefficient (p) between the real 
time series and the time series generated by the different simulation methods. 
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3.3 Procedure 


For the toy problem, the observed data V is sample once from the mixture of uniform densities with the mixture coefficients 
6 described above. The observed data contains 400 observations. To generate the simulated data V, we sample from the 
Dirichlet prior, and with that sample, we then generate 400 observations from the mixture model. We then apply the ABC, 
K2ABC, PABC, ABC SMC, K2ABC SMC and PABC SMC to the observed data, and the simulated data, and compute the 
RMSE over the true and estimated parameters. For all the ABC methods, the procedure for generating simulated data is 
repeated 1000 times. For ABC, we use e = 0.002, and compute the mean and standard deviation as summary statistics. For 
ABC SMC, K2ABC SMC and PABC SMC, we employ = (0.5,0.01,0.005,0.001,0.0005). For the ABC SMC 

type of algorithms, we need to specify what is known as a perturbation kernel. For this, we use a spherical multivariate 
Gaussian distribution, with parameter 0.0001. 

For the noisy nonlinear ecological dynamic system, we apply all the ABC methods and calculate the cross-correlation 
coefficient betwe en the observed data (X>), and the simulated data (V), using each method. We adopt similar priors to 
the ones used bv iMeeds and Wellind (12014 ): logP ^ A/" (3, 0.2), log 9 ~ A/" (—1.5,0.1), log Wq AA(6, 0.2), logcr^i ^ 
J\f (-0.1,0.01), log cTp - A/" (0.1,0.01) and T - P (6). _ _ 

For ABC, we use e = 0.35, and compute 8 summary statistics used in iMeeds and Wellind (l2014l) : 4 statistics using the log of 
the mean of all 25% quantiles of { 4 statistics employing the mean of 25% quantiles of the first-order differences 

of { we also compute the maximum and minimum value of ■ For ABC SMC, K2ABC SMC and 

PABC SMC, we employ = (2,1, 0.5, 0.35, 0.25, 0.2, 0.15); we also specify what is known as a perturbation kernel. 

For this, we use a spherical multivariate Gaussian distribution, with parameter 0.0001. 

For K2ABC and PABC, in both experiments, we use a kernel bandwidth optimization approach based on minimizing the 
mean integrated square error (MISE) between the observed data and si mulated data, to dehne the values o f E, and hy. 
Eor more details about this kernel bandwidth optimization approach, see Shimazaki and Shinomoto ( 2010l) . 


4 Results and discussion 

The ABC, K2ABC, PABC, ABC SMC, K2ABC SMC, PABC SMC were evaluated and compared over the synthetic data 
set and the real data set described in section[3] 

4.1 Results from synthetic data 

All ABC methods were applied over vectors of 400 observations obtained from the uniform mixture model described in 
section |3] Pig. [T|shows a comparison among the evaluated methods. The figure presents the estimated E [0| V] for each 
method. 

Eor K2ABC, PABC, K2ABC SMC and PABC SMC the estimated parameters are close to the true posterior mean of the 
parameters (see section|3]for the true parameters). ABC and ABC SMC do not correctly estimate to 63 and 63 . To observe 
the quality of the prediction using each method, we varied the number of observations and compute the RMSE between the 
true parameter vector and the estimated posterior mean of the parameter vector. We increase the observations from 40 to 
400, in steps of 5 observations. Eor each step, we ran all methods and computed the RMSE. Pig. |2]presents the RMSE when 
the number of observations is increased. 

Comparing Pigs. |2(a)| and |2(b)| show that the RMSE obtained by PABC is the steadiest and present the lowest variability, 
indicating a robust estimator of the parameter vector in terms of the number of observations. The mean and one standard 
deviations for the RMSE for PABC was 0.0696 ± 0.0006. The PABC SMC also present a low variability for this metric, the 
RMSE value was 0.0716 ± 0.0022. RMSE obtained by ABC was 0.0879 ± 0.0050, for K2ABC was 0.0733 ± 0.0031, for 
ABCSMC was 0.0755 ± 0.0032 and for K2ABCSMC was 0.0747 ± 0.0041. Eor ABC, K2ABC, ABC SMC and K2ABC 
SMC, notice how the RMSE decreases when the number of observations increases. 

4.2 Results from nonlinear ecological dynamic system 

In Pig. |3] we present the posterior distribution for the parameters, obtained for the different methods when applied to the 
blowfly dataset of section [3 Pigs. |3(a)[ |3(b^ |3(c)[ |3(d)[ |3(e)| and |3(f)| illustrate the posterior of P, d, Nq, ad- ap and 
r, respectively. In these figures, the black dashed line corresponds to the posterior of the each parameter using K2ABC, 
the blue dashed line is the posterior obtained by PABC, the red dashed line is the posterior employing ABC, the posterior 
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(d) ABC SMC (e) K2ABC SMC (f) PABC SMC 


Figure 1: Estimated posterior mean of model parameters for the uniform mixture model, using all ABC methods. The integer number in 
the X axis represents the subindex i in the parameter 6i. 
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Figure 2: Root-mean-square error for different number of observations. We start to increase the observations from 40 to 400, in steps of 
5 observations. In Fig. |2(a)[ the circles represent the RMSE obtained by ABC, the diamonds are the RMSE for K2ABC and the triangles 
are the RMSE values using PABC. In Fig. |2(b)| the circles represent the RMSE obtained by ABC SMC, the diamonds are the RMSE for 
K2ABC SMC and the triangles are the RMSE values using PABC SMC. 
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obtained by K2ABC SMC is the cyan dashed line, the magenta dashed line is the posterior using PABC SMC and the green 
dashed line is the posterior obtained by ABC SMC. The posterior distributions obtained by using all ABC methods for d, 
ad- ap and Nq are similar. For P and r, the posterior obtained by K2ABC SMC, PABC SMC and ABC SMC are not similar 
with respect to the posterior using K2ABC, PABC and ABC; it is due to the parameter values, since this nonlinear model 
has a specific dynamical range from stable equilibrium to chaos. 



Figure 3: Posterior distribution for parameters of the nonlinear ecological dynamic system using ABC, K2ABC, PABC, ABC SMC, 
K2ABC SMC and PABC SMC. 

To quantify the performance of all methods over the estimated parameters, we compare the time series obtained using the 
posterior means of the parameter and the observed data. For this comparison, we use a cross-correlation coefficient (p) 
between simulated data V and observed data 22. The cross-correlation coefficient measures the similarity between two 
signals. We drew 100 subsets of parameters, we then apply all ABC methods, obtaining 100 estimated posterior mean for 
the parameters, with these estimated parameters we obtained 100 time series for these sets of signals, we compute p, we then 
sort p in descending order and choose the 50 first values, for all methods. Fig. |4]contains a box plot for the cross-correlation 
coefficient (p) using the different methods. 

From Fig. |4] notice that PABC presents the highest median for p, with a value of 0.6501. We also observe that PABC SMC 
obtained a median of 0.6360 for p. K2AB CSMC presents a median of 0.6277, and for ABC SMC the obtained median for 
p was 0.6220. Finally, K2ABC and ABC obtained a median of 0.6138. 

5 Conclusions 

We introduced a new metric for comparing two data distributions in a RKHS, using smoother density estimators to compare 
empirical data distributions, and then highlight the accepted samples by employing ABC methods. We demonstrated that our 
method is a robust estimator of the parameter vector in terms of the number of observations. Finally, we showed for a real 
application that our method obtained the best similarity with respect to the observed data, in an application involving time- 
series. As future work, it would be possible to propose a new dissimilarity distance using RKHS for different applications 
like electrical networks analysis. 
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Figure 4: Boxplots of the 25th and 75th percentiles of the cross-correlation coefficient, using all ABC methods. We drew 100 subsets of 
parameters, we then apply all ABC methods, obtaining 100 estimated posterior mean for the parameters, with these estimated parameters 
we obtained 100 time series for these sets of signals, we compute p, we then sort p in descending order and choose the 50 first values, for 
all methods. 
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